Enable REST protocol between Aggregator and Collaborator#1500
Enable REST protocol between Aggregator and Collaborator#1500teoparvanov merged 1 commit intosecurefederatedai:developfrom
Conversation
541fe87 to
af43e36
Compare
af43e36 to
0c13600
Compare
cc563d9 to
ff9e063
Compare
ff9e063 to
22c4219
Compare
6683e0d to
e615e1d
Compare
d77306c to
0beb7fd
Compare
1b64503 to
66627a6
Compare
|
@psfoley / @teoparvanov is it fine if this refactor / task 'Retry mechanism config pullout, remove the hardcoding' be taken in a subsequent PR ? |
77ba225 to
bde8f0c
Compare
|
@ishaileshpant |
83aad3c to
74f2340
Compare
|
Thanks @ishaileshpant ! This is looking good. Btw, can you manually run https://github.com/ishaileshpant/openfl/actions/workflows/task_runner_flower_e2e.yml on your branch? I don't anticipate any major issues since it'll just run gRPC, but it'll be good to catch any unforeseen issues early |
Thanks for the inputs @kta-intel - have just validated and the default workflow works https://github.com/ishaileshpant/openfl/actions/runs/15036703510/job/42259707667 better to validate this with protocol set to 'rest' - wdyt? |
74f2340 to
098abe5
Compare
thanks for checking, and yes I agree it would be good to validate with |
098abe5 to
5dd5300
Compare
Validated with 'rest' protocol (w/o client auth) https://github.com/ishaileshpant/openfl/actions/runs/15099298504/job/42438127648 |
5dd5300 to
6a742d2
Compare
|
RestAPI is not working as expected - Collaborator is not able to connect with Aggregator |
9c316b7 to
9377caa
Compare
This has been fixed in latest push - the above error comes only in mTLS mode - TLS w/o client auth worked Now with the fix all the cases should be fine |
59327cf to
04abc9e
Compare
There was a problem hiding this comment.
This is looking good, @ishaileshpant. I've tested this locally with mTLS and I think this is in pretty good shape overall. I have a few minor comments from my review. Two other things I would request before merge:
- A REST E2E test for CI. This is a big enough feature I want to be confident this is in a continuous working state.
- Basic documentation. There should either be a minor page that explains how to toggle between gRPC / REST in the plan, or a note of this in the real world federation page (related to proxies that don't support gRPC).
After this is merged for OpenFL 1.9 I see an opportunity for:
- More function / constant reuse between gRPC and REST implementations. I called out the possibility of more consistency between the gRPC / REST datastream messages, but other areas we could streamline are disallowed TLS versions. This feels like it should be defined once somewhere else and imported.
- Additional tests for TLS / mTLS plan configurations. You are pretty well covered for functional tests for mTLS, but it would be good to test around just TLS as well (no client auth)
| # Serialize the TaskResults first | ||
| task_results_bytes = task_results.SerializeToString() | ||
| logger.debug(f"TaskResults serialized size: {len(task_results_bytes)} bytes") | ||
|
|
||
| # Create a DataStream message containing the TaskResults bytes | ||
| data_stream = base_pb2.DataStream(size=len(task_results_bytes), npbytes=task_results_bytes) | ||
|
|
||
| # Create an empty DataStream to signal end of stream | ||
| end_stream = base_pb2.DataStream(size=0, npbytes=b"") | ||
|
|
||
| # Serialize both messages | ||
| data_bytes = data_stream.SerializeToString() | ||
| end_bytes = end_stream.SerializeToString() | ||
|
|
||
| # Create length-prefixed stream format | ||
| stream_data = ( | ||
| struct.pack(">I", len(data_bytes)) # Length prefix for first message | ||
| + data_bytes # First message | ||
| + struct.pack(">I", len(end_bytes)) # Length prefix for second message | ||
| + end_bytes # Second message (empty message signals end) | ||
| ) |
There was a problem hiding this comment.
Is it necessary to encode the data stream this way? If so that's fine. I'm asking because it would be nice to have more symmetry with the gRPC implementation.
There was a problem hiding this comment.
The current implementation is more like a batched RPC call with a custom binary protocol format rather than true streaming and i kept it this way to support both HTTP/1.1 and enhance for HTTP/2 true streaming later on - if you agree then post this iteration we should look at chunked transfer to better support both http/1.1 and eventually http/2 by enhancing when needs arises - wdyt?
We already have this in place as @payalcha has pointed out - we can uncomment this post the PR is merged |
Totally agree, already have some ideas on doing this post the PR merge |
- add new AggregatorClientInterface to allow switching b/w grpc and rest - endhance existing AggregatorGRPCClient to start using AggregatorClientInterface - added new transport package for rest with AggregatorRESTClient implementing AggregatorClientInterface - added streaming api support with custom content-type - added various connection flag for streaming request - send additional header key "Sender" for better request logging at server side - aligned Rest and gRPC client for most of the init params - added AggregatorRESTServer and necesary changes in aggregator cli and federated/plan get_server method - added transport_protocol settings in defaults/network.yaml, defaulted the same to 'grpc' - reduced cyclomatic complexity of Rest Server - fixed protobuf streaming issue for v1/task/results API - added more detailed logging for task progression and metadata for each api calls - pinned Flask version to latest stable 3.1.0 - addressing review comments - 13th-May - added ping api and `collaborato` constructor hint for `AggregatorClientInterface` - added send_message_to_server in client and AggregatorClientInterface, Rest Server is already at parity - changed base uri for REST server to 'experimental/v1', adjusted the client and tests accordingly - fixed issue related to mTLS in REST server/client - disabled TLS 1.2 in both server/client rebased 21st.May.1 Signed-off-by: Shailesh Pant <shailesh.pant@intel.com>
04abc9e to
c3f3844
Compare
add new AggregatorClientInterface to allow switching b/w grpc and rest
endhance existing AggregatorGRPCClient to start using AggregatorClientInterface
added new transport package for rest with AggregatorRESTClient implementing AggregatorClientInterface
added streaming api support with custom content-type
added various connection flag for streaming request
send additional header key "Sender" for better request logging at server side
aligned Rest and gRPC client for most of the init params
added AggregatorRESTServer and necesary changes in aggregator cli and federated/plan get_server method
added transport_protocol settings in defaults/network.yaml, defaulted the same to 'grpc'
reduced cyclomatic complexity of Rest Server
fixed protobuf streaming issue for v1/task/results API
added more detailed logging for task progression and metadata for each api calls
pinned Flask version to latest stable 3.1.0
addressing review comments - 13th-May
added ping api and
collaboratoconstructor hint forAggregatorClientInterfaceadded send_message_to_server in client and AggregatorClientInterface, Rest Server is already at parity
changed base uri for REST server to 'experimental/v1', adjusted the client and tests accordingly
fixed issue related to mTLS in REST server/client
disabled TLS 1.2 in both server/client
rebased 21st.May.1
Add AggregatorRESTServer
Remove collaborator cli support for protocol switch, this should be done via Plan/Aggregator
Add extensive logging for both REST server and client
Transparently create either REST or gRPC server and client binding when running a federation based on Plan config
Add Unit level test for existing modules
Add unit test for new modules that are added
Remove debug logs added for testing purposed
Manual testing of various config and status update in PR
Refactor verbose debug/info level log messages and metadata that gets logged
Known issue - mTLS communication is not working - when tls and client_auth both are enabled - Debug in progress
Validate that FedEval feature works as well with REST protocol
Validate with at-least two workspaces manually for REST :
Summary
Enable rest protocol support for TaskRunner API
Type of Change (Mandatory)
Description (Mandatory)
This PR adds REST protocol support for TaskRunner API there by allowing running federation with REST protocol for seamless integration with managed/public API gateways
Unit Test report
Full Unit test report